40 research outputs found
A review of probabilistic forecasting and prediction with machine learning
Predictions and forecasts of machine learning models should take the form of
probability distributions, aiming to increase the quantity of information
communicated to end users. Although applications of probabilistic prediction
and forecasting with machine learning models in academia and industry are
becoming more frequent, related concepts and methods have not been formalized
and structured under a holistic view of the entire field. Here, we review the
topic of predictive uncertainty estimation with machine learning algorithms, as
well as the related metrics (consistent scoring functions and proper scoring
rules) for assessing probabilistic predictions. The review covers a time period
spanning from the introduction of early statistical (linear regression and time
series models, based on Bayesian statistics or quantile regression) to recent
machine learning algorithms (including generalized additive models for
location, scale and shape, random forests, boosting and deep learning
algorithms) that are more flexible by nature. The review of the progress in the
field, expedites our understanding on how to develop new algorithms tailored to
users' needs, since the latest advancements are based on some fundamental
concepts applied to more complex algorithms. We conclude by classifying the
material and discussing challenges that are becoming a hot topic of research.Comment: 83 pages, 5 figure
Machine learning for uncertainty estimation in fusing precipitation observations from satellites and ground-based gauges
To form precipitation datasets that are accurate and, at the same time, have
high spatial densities, data from satellites and gauges are often merged in the
literature. However, uncertainty estimates for the data acquired in this manner
are scarcely provided, although the importance of uncertainty quantification in
predictive modelling is widely recognized. Furthermore, the benefits that
machine learning can bring to the task of providing such estimates have not
been broadly realized and properly explored through benchmark experiments. The
present study aims at filling in this specific gap by conducting the first
benchmark tests on the topic. On a large dataset that comprises 15-year-long
monthly data spanning across the contiguous United States, we extensively
compared six learners that are, by their construction, appropriate for
predictive uncertainty quantification. These are the quantile regression (QR),
quantile regression forests (QRF), generalized random forests (GRF), gradient
boosting machines (GBM), light gradient boosting machines (LightGBM) and
quantile regression neural networks (QRNN). The comparison referred to the
competence of the learners in issuing predictive quantiles at nine levels that
facilitate a good approximation of the entire predictive probability
distribution, and was primarily based on the quantile and continuous ranked
probability skill scores. Three types of predictor variables (i.e., satellite
precipitation variables, distances between a point of interest and satellite
grid points, and elevation at a point of interest) were used in the comparison
and were additionally compared with each other. This additional comparison was
based on the explainable machine learning concept of feature importance. The
results suggest that the order from the best to the worst of the learners for
the task investigated is the following: LightGBM, QRF, GRF, GBM, QRNN and QR..
Ensemble learning for blending gridded satellite and gauge-measured precipitation data
Regression algorithms are regularly used for improving the accuracy of
satellite precipitation products. In this context, ground-based measurements
are the dependent variable and the satellite data are the predictor variables,
together with topography factors. Alongside this, it is increasingly recognised
in many fields that combinations of algorithms through ensemble learning can
lead to substantial predictive performance improvements. Still, a sufficient
number of ensemble learners for improving the accuracy of satellite
precipitation products and their large-scale comparison are currently missing
from the literature. In this work, we fill this specific gap by proposing 11
new ensemble learners in the field and by extensively comparing them for the
entire contiguous United States and for a 15-year period. We use monthly data
from the PERSIANN (Precipitation Estimation from Remotely Sensed Information
using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE
Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation
data from the Global Historical Climatology Network monthly database, version 2
(GHCNm). The ensemble learners combine the predictions by six regression
algorithms (base learners), namely the multivariate adaptive regression splines
(MARS), multivariate adaptive polynomial splines (poly-MARS), random forests
(RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and
Bayesian regularized neural networks (BRNN), and each of them is based on a
different combiner. The combiners include the equal-weight combiner, the median
combiner, two best learners and seven variants of a sophisticated stacking
method. The latter stacks a regression algorithm on the top of the base
learners to combine their independent predictions...Comment: arXiv admin note: text overlap with arXiv:2301.0125
Deep Huber quantile regression networks
Typical machine learning regression applications aim to report the mean or
the median of the predictive probability distribution, via training with a
squared or an absolute error scoring function. The importance of issuing
predictions of more functionals of the predictive probability distribution
(quantiles and expectiles) has been recognized as a means to quantify the
uncertainty of the prediction. In deep learning (DL) applications, that is
possible through quantile and expectile regression neural networks (QRNN and
ERNN respectively). Here we introduce deep Huber quantile regression networks
(DHQRN) that nest QRNNs and ERNNs as edge cases. DHQRN can predict Huber
quantiles, which are more general functionals in the sense that they nest
quantiles and expectiles as limiting cases. The main idea is to train a deep
learning algorithm with the Huber quantile regression function, which is
consistent for the Huber quantile functional. As a proof of concept, DHQRN are
applied to predict house prices in Australia. In this context, predictive
performances of three DL architectures are discussed along with evidential
interpretation of results from an economic case study.Comment: 31 pages, 9 figure
Twenty-three unsolved problems in hydrology (UPH) â a community perspective
This paper is the outcome of a community initiative to identify major unsolved scientific problems in hydrology motivated by a need for stronger harmonisation of research efforts. The procedure involved a public consultation through on-line media, followed by two workshops through which a large number of potential science questions were collated, prioritised, and synthesised. In spite of the diversity of the participants (230 scientists in total), the process revealed much about community priorities and the state of our science: a preference for continuity in research questions rather than radical departures or redirections from past and current work. Questions remain focussed on process-based understanding of hydrological variability and causality at all space and time scales.
Increased attention to environmental change drives a new emphasis on understanding how change propagates across interfaces within the hydrological system and across disciplinary boundaries. In particular, the expansion of the human footprint raises a new set of questions related to human interactions with nature and water cycle feedbacks in the context of complex water management problems. We hope that this reflection and synthesis of the 23 unsolved problems in hydrology will help guide research efforts for some years to come